140 research outputs found

    Lempel-Ziv Parsing in External Memory

    Full text link
    For decades, computing the LZ factorization (or LZ77 parsing) of a string has been a requisite and computationally intensive step in many diverse applications, including text indexing and data compression. Many algorithms for LZ77 parsing have been discovered over the years; however, despite the increasing need to apply LZ77 to massive data sets, no algorithm to date scales to inputs that exceed the size of internal memory. In this paper we describe the first algorithm for computing the LZ77 parsing in external memory. Our algorithm is fast in practice and will allow the next generation of text indexes to be realised for massive strings and string collections.Comment: 10 page

    Engineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet

    Get PDF
    Peer reviewe

    Medium-Space Algorithms for Inverse BWT

    Get PDF
    Peer reviewe

    Repetition-Based Text Indexes

    Get PDF
    ... fast pattern matching queries. The scheme provides a general framework for representing information about repetitions, i.e., multiple occurrences of the same string in the text, and for using the information in pattern matching. Well-known text indexes, such as suffix trees, suffix arrays, DAWGs and their variations, which we collectively call suffix indexes, can be seen as instances of the scheme. Based on th
    corecore